本文考虑了深神经网络中随机矩阵普遍性的几个方面。在最近的实验工作中,我们使用与局部统计相关的随机矩阵的普遍特性,以基于其Hessians的现实模型来获得对深神经网络的实际含义。特别是,我们得出了深度神经网络光谱中异常值的普遍方面,并证明了随机矩阵局部定律在流行的预处理梯度下降算法中的重要作用。我们还通过基于统计物理学和随机矩阵理论的工具的一般参数,对深度神经网络损失表面的见解。
translated by 谷歌翻译
我们调查人工神经网络的损失表面Hessians的局部光谱统计数据,在那里我们发现跨多个网络架构和数据集的高斯正交集合统计数据非常一致。这些结果阐述了随机矩阵理论对神经网络建模的适用性,并提出了在深度学习中损失表面研究中的先前未被识别的作用。通过这些观察的启发,我们提出了一种新颖的神经网络的真正损失表面模型,与我们的观察结果一致,这允许Hessian光谱密度在实践中广泛观察到具有秩的退化性和异常值,并预测损失梯度的独立性越来越长重量空间中距离的函数。我们进一步调查了神经网络中真正损失表面的重要性,并与以前的工作相比,找到了定位全球最小值的指数硬度对实现最新性能的实际后果。
translated by 谷歌翻译
我们研究了使用尖刺,现场依赖的随机矩阵理论研究迷你批次对深神经网络损失景观的影响。我们表明,批量黑森州的极值值的大小大于经验丰富的黑森州。我们还获得了类似的结果对Hessian的概括高斯牛顿矩阵近似。由于我们的定理,我们推导出作为批量大小的最大学习速率的分析表达式,为随机梯度下降(线性缩放)和自适应算法(例如ADAM(Square Root Scaling)提供了通知实际培训方案,例如光滑,非凸深神经网络。虽然随机梯度下降的线性缩放是在我们概括的更多限制性条件下导出的,但是适应优化者的平方根缩放规则是我们的知识,完全小说。随机二阶方法和自适应方法的百分比,我们得出了最小阻尼系数与学习率与批量尺寸的比率成比例。我们在Cifar-$ 100 $和ImageNet数据集上验证了我们的VGG / WimerEsnet架构上的索赔。根据我们对象检的调查,我们基于飞行学习率和动量学习者开发了一个随机兰齐齐竞争,这避免了对这些关键的超参数进行昂贵的多重评估的需求,并在预残留的情况下显示出良好的初步结果Cifar的architecure - $ 100 $。
translated by 谷歌翻译
我们调查了Wigner半圈和Marcenko-Pastur分布,通常用于深度神经网络理论分析,匹配经验观察到的光谱密度。我们发现甚至允许异常值,观察到的光谱形状强烈地偏离了这种理论预测。这提出了关于这些模型在深度学习中的有用性的重要问题。我们进一步表明,理论结果,例如关键点的分层性质,强烈依赖于这些限制光谱密度的确切形式的使用。我们考虑两个新的矩阵集合;随机Wigner / Wishart集合产品和渗透的Wigner / Wishart集合,两者都更好地匹配观察光谱。它们还给出了原点的大型离散光谱峰,为观察提供了一种理论解释,即各种Optima可以通过一维的低损耗值连接。我们进一步表明,在随机矩阵产品的情况下,离散光谱分量的重量为0美元取决于权重矩阵的尺寸的比率。
translated by 谷歌翻译
我们使用高斯过程扰动模型在高维二次上的真实和批量风险表面之间的高斯过程扰动模型分析和解释迭代平均的泛化性能。我们从我们的理论结果中获得了三个现象\姓名:}(1)将迭代平均值(ia)与大型学习率和正则化进行了改进的正规化的重要性。 (2)对较少频繁平均的理由。 (3)我们预计自适应梯度方法同样地工作,或者更好,而不是其非自适应对应物的迭代平均值。灵感来自这些结果\姓据{,一起与}对迭代解决方案多样性的适当正则化的重要性,我们提出了两个具有迭代平均的自适应算法。与随机梯度下降(SGD)相比,这些结果具有明显更好的结果,需要较少调谐并且不需要早期停止或验证设定监视。我们在各种现代和古典网络架构上展示了我们对CiFar-10/100,Imagenet和Penn TreeBank数据集的方法的疗效。
translated by 谷歌翻译
Numerous works use word embedding-based metrics to quantify societal biases and stereotypes in texts. Recent studies have found that word embeddings can capture semantic similarity but may be affected by word frequency. In this work we study the effect of frequency when measuring female vs. male gender bias with word embedding-based bias quantification methods. We find that Skip-gram with negative sampling and GloVe tend to detect male bias in high frequency words, while GloVe tends to return female bias in low frequency words. We show these behaviors still exist when words are randomly shuffled. This proves that the frequency-based effect observed in unshuffled corpora stems from properties of the metric rather than from word associations. The effect is spurious and problematic since bias metrics should depend exclusively on word co-occurrences and not individual word frequencies. Finally, we compare these results with the ones obtained with an alternative metric based on Pointwise Mutual Information. We find that this metric does not show a clear dependence on frequency, even though it is slightly skewed towards male bias across all frequencies.
translated by 谷歌翻译
This report summarizes the work carried out by the authors during the Twelfth Montreal Industrial Problem Solving Workshop, held at Universit\'e de Montr\'eal in August 2022. The team tackled a problem submitted by CBC/Radio-Canada on the theme of Automatic Text Simplification (ATS).
translated by 谷歌翻译
Feature acquisition algorithms address the problem of acquiring informative features while balancing the costs of acquisition to improve the learning performances of ML models. Previous approaches have focused on calculating the expected utility values of features to determine the acquisition sequences. Other approaches formulated the problem as a Markov Decision Process (MDP) and applied reinforcement learning based algorithms. In comparison to previous approaches, we focus on 1) formulating the feature acquisition problem as a MDP and applying Monte Carlo Tree Search, 2) calculating the intermediary rewards for each acquisition step based on model improvements and acquisition costs and 3) simultaneously optimizing model improvement and acquisition costs with multi-objective Monte Carlo Tree Search. With Proximal Policy Optimization and Deep Q-Network algorithms as benchmark, we show the effectiveness of our proposed approach with experimental study.
translated by 谷歌翻译
360-degree panoramic videos have gained considerable attention in recent years due to the rapid development of head-mounted displays (HMDs) and panoramic cameras. One major problem in streaming panoramic videos is that panoramic videos are much larger in size compared to traditional ones. Moreover, the user devices are often in a wireless environment, with limited battery, computation power, and bandwidth. To reduce resource consumption, researchers have proposed ways to predict the users' viewports so that only part of the entire video needs to be transmitted from the server. However, the robustness of such prediction approaches has been overlooked in the literature: it is usually assumed that only a few models, pre-trained on past users' experiences, are applied for prediction to all users. We observe that those pre-trained models can perform poorly for some users because they might have drastically different behaviors from the majority, and the pre-trained models cannot capture the features in unseen videos. In this work, we propose a novel meta learning based viewport prediction paradigm to alleviate the worst prediction performance and ensure the robustness of viewport prediction. This paradigm uses two machine learning models, where the first model predicts the viewing direction, and the second model predicts the minimum video prefetch size that can include the actual viewport. We first train two meta models so that they are sensitive to new training data, and then quickly adapt them to users while they are watching the videos. Evaluation results reveal that the meta models can adapt quickly to each user, and can significantly increase the prediction accuracy, especially for the worst-performing predictions.
translated by 谷歌翻译
This paper presents a corpus annotated for the task of direct-speech extraction in Croatian. The paper focuses on the annotation of the quotation, co-reference resolution, and sentiment annotation in SETimes news corpus in Croatian and on the analysis of its language-specific differences compared to English. From this, a list of the phenomena that require special attention when performing these annotations is derived. The generated corpus with quotation features annotations can be used for multiple tasks in the field of Natural Language Processing.
translated by 谷歌翻译